Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
translated by 谷歌翻译
来自数据的顺序模式是各种时间序列预测任务的核心。深度学习模型大大优于许多传统模型,但是这些黑框模型通常缺乏预测和决策的解释性。为了揭示具有可理解的数学表达式的潜在趋势,科学家和经济学家倾向于使用部分微分方程(PDE)来解释顺序模式的高度非线性动力学。但是,它通常需要领域专家知识和一系列简化的假设,这些假设并不总是实用的,并且可能偏离不断变化的世界。是否可以动态地学习与数据的差异关系以解释时间不断发展的动态?在这项工作中,我们提出了一个学习框架,该框架可以自动从顺序数据中获取可解释的PDE模型。特别是,该框架由可学习的差分块组成,称为$ p $ blocks,事实证明,该框架能够近似于理论上随着时间不断变化的复杂连续功能。此外,为了捕获动力学变化,该框架引入了元学习控制器,以动态优化混合PDE模型的超参数。 《时代》系列预测金融,工程和健康数据的广泛实验表明,我们的模型可以提供有价值的解释性并实现与最先进模型相当的性能。从经验研究中,我们发现学习一些差异操作员可能会捕获无需大量计算复杂性的顺序动力学的主要趋势。
translated by 谷歌翻译
深度学习模型已经实现了患者电子健康记录(EHR)的有希望的疾病预测。但是,大多数模型在I.I.D.下开发了假设未能考虑不可知的分布变化,从而降低了深度学习模型到分布(OOD)数据的概括能力。在这种情况下,将利用可能在不同环境中发生变化的虚假统计相关性,这可能会导致深度学习模型的次优性能。训练分布中存在过程和诊断之间的不稳定相关性可能会导致历史EHR与未来诊断之间的虚假相关性。为了解决这个问题,我们建议使用一种称为因果医疗保健嵌入(CHE)的因果表示学习方法。 CHE旨在通过消除诊断和程序之间的依赖性来消除虚假的统计关系。我们介绍了希尔伯特 - 史密特独立标准(HSIC),以衡量嵌入式诊断和程序特征之间的独立性。基于因果观点分析,我们执行样本加权技术,以摆脱这种虚假关系,以跨不同环境对EHR进行稳定学习。此外,我们提出的CHE方法可以用作灵活的插件模块,可以增强EHR上现有的深度学习模型。在两个公共数据集和五个最先进的基线上进行了广泛的实验表明,CHE可以通过大幅度提高深度学习模型对分布数据的预测准确性。此外,可解释性研究表明,CHE可以成功利用因果结构来反映历史记录对预测的更合理贡献。
translated by 谷歌翻译
建模用户从历史行为中的动态偏好在于现代推荐系统的核心。由于用户兴趣的多样性,最近的进步建议多功能网络将历史行为编码为多个兴趣向量。在实际情况下,通常会一起检索相应的捕获兴趣项目,以获取曝光并收集到培训数据中,从而产生兴趣之间的依赖性。不幸的是,多息网络可能错误地集中在被捕获的利益之间的微妙依赖性上。被这些依赖性误导了,捕获了无关的利益和目标之间的虚假相关性,从而导致训练和测试分布不匹配时预测结果不稳定。在本文中,我们介绍了广泛使用的Hilbert-Schmidt独立标准(HSIC)来衡量被捕获的利益之间的独立性程度,并经验表明,HSIC的持续增加可能会损害模型性能。基于此,我们提出了一个新颖的多息网络,称为深稳定的多功能学习(Desmil),该网络试图通过学习权重以训练样本的学习权重消除捕获的兴趣中微妙的依赖性的影响因果关系。我们对公共建议数据集,大规模工业数据集和合成数据集进行了广泛的实验,这些数据集模拟了分布数据的数据集。实验结果表明,我们提出的Desmil的表现优于最先进的模型。此外,我们还进行了全面的模型分析,以揭示Desmil在一定程度上工作的原因。
translated by 谷歌翻译
通过将退出层添加到深度学习网络中,早期出口可以通过准确的结果终止推理。是退出还是继续下一层的被动决策必须经过每个预位的退出层,直到其退出为止。此外,还很难在推理收益旁调整计算平台的配置。通过合并低成本预测引擎,我们为计算和节能深度学习应用提供了预测出口框架。预测出口可以预测网络将退出的位置(即,建立剩余层的数量以完成推理),这可以通过按时何时退出而无需运行每个预定位置的退出层来有效地降低网络计算成本。此外,根据剩余层的数量,选择了正确的计算配置(即频率和电压)以执行网络以进一步节省能源。广泛的实验结果表明,与经典的深度学习网络相比,预测性退出可实现多达96.2%的计算减少和72.9%的能量。与最先进的退出策略相比,与早期退出相比,降低了12.8%的计算和37.6%的能量,鉴于相同的推理准确性和潜伏期。
translated by 谷歌翻译
拟合科学数据的部分微分方程(PDE)可以用可解释的机制来代表各种以数学为导向的受试者的物理定律。从科学数据中发现PDE的数据驱动的发现蓬勃发展,作为对自然界中复杂现象进行建模的新尝试,但是当前实践的有效性通常受数据的稀缺性和现象的复杂性的限制。尤其是,从低质量数据中发现具有高度非线性系数的PDE在很大程度上已经不足。为了应对这一挑战,我们提出了一种新颖的物理学指导学习方法,该方法不仅可以编码观察知识,例如初始和边界条件,而且还包含了基本的物理原理和法律来指导模型优化。我们从经验上证明,所提出的方法对数据噪声和稀疏性更为强大,并且可以将估计误差较大。此外,我们第一次能够发现具有高度非线性系数的PDE。凭借有希望的性能,提出的方法推动了PDE的边界,这可以通过机器学习模型来进行科学发现。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译